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Abstract — This paper describes performance bounds for com- 
pressed sensing in the presence of Poisson noise when the 
underlying signal, a vector of Poisson intensities, is sparse 
or compressible (admits a sparse approximation). The signal- 
independent and bounded noise models used in the literature to 
analyze the performance of compressed sensing do not accurately 
model the effects of Poisson noise. However, Poisson noise is an 
appropriate noise model for a variety of applications, including 
low-light imaging, where sensing hardware is large or expensive, 
and limiting the number of measurements collected is important. 
In this paper, we describe how a feasible positivity-preserving 
sensing matrix can be constructed, and then analyze the per- 
formance of a compressed sensing reconstruction approach for 
Poisson data that minimizes an objective function consisting of 
a negative Poisson log likelihood term and a penalty term which 
could be used as a measure of signal sparsity. 

I. Introduction 

The basic idea of compressed sensing is that, when the 
signal of interest is very sparse (i.e., zero-valued at most 
locations) or highly compressible in some basis, relatively 
few "incoherent" observations are sufficient to reconstruct the 
most significant non-zero signal components [1], [2]. Despite 
the promise of this theory for many applications, very little 
is known about its applicability to photon-limited imaging 
systems, where high-quality photomultiplier tubes (PMTs) 
are expensive and physically large, limiting the number of 
observations that can reasonably be collected by an imaging 
system. Limited photon counts arise in a wide variety of 
applications, including infrared imaging, nuclear medicine, 
astronomy and night vision, where the number of photons 
collected by the detector elements is very small relative to the 
number of pixels, voxels, or other quantities to be estimated. 
Robust reconstruction methods can potentially lead to many 
novel imaging systems designed to make the best possible use 
of the small number of photons collected while reducing the 
size and cost of the detector array. 

However, the signal-independent and bounded noise models 
which have been considered in the literature (cf. [3], [4]) are 
not easily adapted to the Poisson noise models used in photon- 
limited imaging. The Poisson model is often used to model 
images acquired by photon-counting devices [5]. Under the 
Poisson assumption, we can write our observation model as 



where /* S M." 1 is the signal or image of interest, A £ M. Nxm 
linearly projects the scene onto an A^-dimensional space of 
observations, and y S {0, 1,2,.. .} N is a length-iV vector of 
observed Poisson counts. Specifically, under the model in ([TJ, 
the likelihood of observing a particular vector of counts y is 
given by 



p(y\Af*) = n 



N {Ar)Vi 
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y ~ Poisson( J 4/*), 



(1) 



where (Af*)j is the j th component of Af*. 

The majority of the compressed sensing literature assumes 
that there exists a "sparsifying" reference basis W, so that 
9* = W T f * is sparse or lies in a weak-£ p space. When the 
matrix product AW obeys the so-called restricted isometry 
property (RIP) [6], [7] or some related criterion, and when 
the noise is bounded or Gaussian, then 9* can be accurately 
estimated from y by solving the following £2 — £i optimization 
problem (or some variant): 

6 = argminlly- AW9\\\ + r||0||i, (2) 



where r > is a regularization parameter [2], [8], [7]. 

However, the £2 data-fitting term, \\y — j4W#||f, is problem- 
atic in the presence of Poisson noise. Because the variance of 
the noisy observations is proportional to the signal intensity, 
£2 data-fitting terms can lead to significant overfitting in high- 
intensity regions and oversmoothing in low-intensity regions. 
Furthermore, photon-limited imaging systems implicitly place 
hard constraints on the nature of the measurements that can 
be collected, such as non-negativity, which are not considered 
in much of the existing compressed sensing literature (recent 
papers of Dai and Milenkovic [9] and of Khajehnejad et al. 
[10] are notable exceptions). 

In this paper, we propose estimating /* from y using a 
regularized Poisson log-likelihood objective function as an 
alternative to Q, and we present risk bounds for recovery of 
a compressible signal from Poisson observations. Specifically, 
in the Poisson noise setting we maximize the log-likelihood 
while minimizing a penalty function that, for instance, could 



measure the sparsity of 9 = W T f: 



N 



f 



axgmin (-y 3 ;log(Af)j) + r pen(/) 



subject to AfhO,fh 0, E™ i fi=I 

where pen(-) is a penalty function that will be detailed later, 
I is the total intensity of the unknown /* (assumed known), 
and the standard notation v >z means that the components 
of v are nonnegative. The constraints reflect the nonnegativity 
of both the observed intensity and the underlying image and 
the known total intensity of the underlying image. 

II. Problem formulation 

We have a signal or image /* of length m that we wish to 
estimate using a detector array of length N <C m. We assume 
that /* y 0. We will bound the accuracy with which we 
can estimate /*//, where I = EZLi f?> m other words, we 
focus on accurately estimating the shape of /* independent 
of any scaling factor proportional to the total intensity of the 
scene. We assume that the total intensity I is known, and 
our candidate estimators will also be constrained to have total 
intensity I. The quality of a candidate estimator / will be 
measured in terms of the risk 
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smaller for sparser solutions 9 = W T /, where W is an 
orthogonal matrix that represents / in its "sparsifying" basis, 
our main result only assumes that (|5]l is satisfied. We can 
think of Q as a discretized-feasibility version of pj, where 
we optimize over a countable set of feasible vectors that grows 
in a controlled way with signal length m. 

III. Properties of the sensing matrix A 

Our main result, stated and proved in the next section, 
makes use of the several properties of the sensing matrix A 
(and ^4). The most important of these properties is that, with 
high probability, A acts near-isometrically on certain subsets 
of R m . The usual formulation of this phenomenon is known 
in the compressed sensing literature as the restricted isometry 
property (RIP) [6], [7], where the subset of interest consists 
of all vectors with a given sparsity. In fact, the RIP is a 
special case of a much broader circle of results concerning 
the behavior of random matrices whose entries are drawn 
from a subgaussian isotropic ensemble [11]. The Rademacher 
ensemble is an instance of this, and the following two theorems 
can be extracted from the results of [11]: 

Theorem 1 There exist absolute constants c\,C2 > 0, such 
that, with probability at least 1 — e~ ClN , 

\\u - vh < V2N\\A(u - v)\\ a + c^^f^ 



We construct our sensing matrix A as follows. Let Z e 
{— l,+l} Nxm be a matrix whose entries Z^j are indepen- 
dent Rademacher random variables, i.e., P[Z i:j = -1] = for all u, v g E m such that \\u\\\ = \\v\\\ = 1 
P [Zij = +1] = 1/2 independently of all other Z { ^y. Let A = 
(1/N)Z. Most compressed sensing approaches would proceed 
by assuming that we make (potentially noisy) observations 
of the product Af*, but elements of Af* could be negative 
and thus not physically realizable in photon-counting systems. 
However, we can use A to generate a positivity-preserving 
sensing matrix A as follows. Let l rxs denote the r x s matrix 
all of whose entries are equal to 1. Then we let 



-c 3 N 



Theorem 2 There exist absolute constants 03,04 > 0, such 
that the following holds. Let S be a finite subset of the unit 
sphere in K m . Then, with probability at least 1 — 

1/2 < N\\As\\l < 3/2, Vs € S 

provided N > C4 log 2 |<S|. 



A = A + (l/iV)ljVxm- 

Note that A e {0, 2/N} Nxm and, as a consequence, A indeed 
preserves positivity: for any / £ RT\ Af y 0. 

We make Poisson observations of Af*, y ~ Poisson(A/*), 
and our goal is to estimate /* e from y g {0, 1,2,.. .} N . 
To this end, we propose solving the following optimization 
problem: 



We will also rely on the following properties of A and A: 

• With probability at least 1 — N2~ m , every row of Z has 
at least one positive entry. Let / 6 W n be an arbitrary 
vector of intensities satisfying / >z (c/)l mx i for some 
c> 0. Then 

Af h {2cI/N)l Nxl . 



f = arg min 
fer 



-logp(y\Af) + 2pen(f) 



With probability at least 1 - 2me~ Ar/8 , 



(4) 



where pen(/) is a penalty term. We assume that V = T(m, J) 
is a countable set of feasible estimators / € M" 1 satisfying 
E™ 1 fi = I, and that the penalty function satisfies the Kraft 
inequality: 

^ e - P on(/)< L (5) 

fer 

Note that, by construction of A, f E T implies that Af >z 0. 
Furthermore, while the penalty term may be chosen to be 
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<i/4, Vje{i,. 



(6) 



(7) 



(this is a simple consequence of the Chernoff bound and 

the union bound). 

If the event |7]) holds, then 



A' 



(3/4)/ < 253^/, < (5/4)/, V/ei;» (8) 
i=i 3=1 



IV. An oracle inequality for the expected risk 

We now state and prove our main result, which gives 
an upper bound on the expected risk ER(f*,f) that holds 
for any target signal f* >z satisfying the normalization 
constraint Y^TLi ft = I> without assuming anything about 
the sparsity properties of /*. Conceptually, our bound is an 
oracle inequality, which states that the expected risk of our 
estimator is within a constant factor of the best regularized 
risk attainable by estimators in T with full knowledge of the 
underlying signal /*. More precisely, for each / S T define 



R*(f*J) = 







f 




I 


I 



2pen(/) 



and for every F C T let #*(/*, T) = min /eI v #*(/*, /)■ 
Note that R*(f*,V) is the best penalized risk that can be 
attained over V by an oracle that has full knowledge of /*. 
We then have the following: 

Theorem 3 Suppose that the feasible set T also satisfies the 
condition 

fh(d)i mxl , v/er (9) 

for some < c < 1. Let Qn be the collection of all subsets 
V C T, such that \T'\ < 2 W / C4 . Then the following holds 



with probability at least 1 



me 



-KN 



for some positive K 



K{c\,Cs) (with respect to the realization of A): 

24log(c 2 m/N) 



ER(f*J)<C N min R*{f*,T') 



N 



, (10) 



where Cn — max(20, l5/c)N, and the expectation is taken 
with respect to y ~ Poisson(A/*). 

Remark 1. A positivity condition similar to |9} is natural 
in the context of estimating vectors with nonnegative entries 
from count data. In particular, it excludes the possibility of 
assigning zero intensity to an input of a detector when at least 
one photon has been counted [12]. However, as will be clear 
from the proof below, condition Q can be replaced with a 
more general (weaker) condition 

Afh(c'i/N)i Nxl , v/er 

for some c' > 0, which is more appropriate when the signal 
/* is sparse in the canonical basis, because then it is in fact 
desirable to allow candidate estimators with zero components. 

Proof: With high probability, the following chain of 
estimates holds: 

^11/* -/II! 



< 



< 



AN 
-p\ 

AN 
-p\ 

4N 

J2\ 



A(f* 


-/)lllH 


2c\ log(c 2 m/7V) 


N 


A(f* 


-/)lllH 


2c\ log(c 2 m/7V) 


N 


A(f 


-/)ll?H 


2c\ \og{c 2 m/N) 


N 



where the first inequality is a consequence of Theorem [T] and 
the remaining steps follow from definitions and from standard 
inequalities for £ p norms. Moreover, with high probability, 




(Af) 



1/2 



(An] /2 + (Af) 



1/2 



(Af)i /2 (Anr + (Af) 



1/2 



,1/2 



< 5ij2 un 1 / 2 - (Af)] /2 



where the first inequality is due to Cauchy-Schwarz, and the 
second inequality is a consequence of and the inequality 
between the arithmetic mean and the geometric mean. It is a 
matter of straightforward algebra to show that 



N 



£ (4f) 



1/2 



1/2 



= -21og£ 



= 2 log 



exp 



{AnY 2 -{Af)] /2 



p{y\Af*)p(y\Af)dv{y) 



where v is the counting measure on {0, 1, 2, . . .} N . Now, the 
same techniques as in Li and Barron [13] (see also the proof 
of Theorem 7 in [14]) can be used to show that 

2 :1,,; | / Jp(y\Af*)p(y\Af)dis(y) * 



< min 
/er 



KL p(-|Af) p(-\Af) +2pen(/) 



(11) 



where KL(-|j •) is the Kullback-Leibler (KL) divergence, which 
for the Poisson likelihoods has the form 



KL{p(-\Ar)\p(-\Af) 

N r 



(AT) 



— (Af*)i + (Af)i 



Using the inequality log t < t — 1 together with (|9]l and (|6j, 
we can bound the KL divergence as 



N 

E 



W)ilog 



(Af* 



< 



N 

E 

i=l 
N 



(An 



(Af)i 
((Af 



\(Af) 



(Af*)i + (Af) 

- (Af*)i + {Af)i 



E TTJV H A f)i 2 (AfUA.rh + (Af*) 2 ] 
i=i lAJ H 



^\\A(r-f)t, 



Now, choose any T* E Gn, sucn that 

R*(f*,T*) = min R*(f*,T'). 



Then, applying Theorem 
have, with high probability, that 

iV||^(/* - < (3/2)||/* - /|||, 



to the set [jf^jy ■ / G T*|, 

v/ e r*. 



Combining everything, we get the bound Ei?,(/*,/) < 

2 



max 20, 



15 



N min 



/* 



2pcn(/) 
I 



2c 2 ! log(c 2 m/iV) 
N 



which holds with high probability w.r.t. the realization of A. 
Let Cat = max(20, 15 /c)N. The theorem is proved. ■ 

V. Risk bounds for compressible signals 

We now show how the bound in Theorem [3] can be used 
to analyze how the performance of the proposed estimator 
when the target signal /* is compressible (i.e., admits a sparse 
approximation) in some reference orthonormal basis. 

Following [1], we assume that there exists an orthonormal 
basis $ = {</>i, . . . , <fi m } of W n , such that /* is compressible 
in $ in the following sense. Let W be the orthogonal matrix 
with columns <j>x, , . . , <p m . Then the vector 9* of the coeffi- 
cients 9* = (f*,(/>j) of /* in $ is related to /* via /* = W6*. 
Let 8L, . . . , 9* m) be the entries of 9* arranged in the Order- 



Cm) 



of decreasing magnitude: _ \0 



'(2) 



> 



> 



'(m)l 



We 



assume that there exist some < q < oo and p > 0, such that 
for each 1 < j < m 

< pir 1/q - 



Note that for every 1 < j < to we have 



(12) 



7 (.;) 



< 



irii 2 <iiriii = /, 



so we can take p to be a constant independent of / or to. 
Any 9* satisfying (12i is said to belong to the weak-l q ball 



of radius pi. The weak-£ g condition (12i translates into the 



following approximation estimate: given any 1 < k < to, let 
#( fc ) denote the best A: -term approximation to 9*. Then we can 
show that 

2 



0« 



< Cp 2 k~ 



= l/ g -l/2 (13) 



for some constant C > that depends only on q. We also 
assume that /* satisfies the condition (|9| for some c S (0,1), 
a lower bound on which is assumed known. 

In order to apply Theorem [3] we will form a suitable finite 
class of estimators T and set a penalty function pen(/) over 
this class which (a) is smaller for sparser 9 = W T f and (b) 
satisfies d5J. The family T is constructed as follows. 

1) Define the sets 



6 = {0e 



each 9i uniformly quantized to one of ^/rn levels} 



and T = {/ e E m : / = W9, 9 G 6}. 
2) For each / G T, let / denote the £2 projection of / onto 
the closed convex set 

C=\ge E m : g h (cl)l mxl and = A , 



i.e., 



/ = axgmin||/-p|| 2 . 

gee 

3) Finally, let T = {9 = W T f : f eJ 7 }. 
Note that the projection / satisfies the Pythagorean identity 

Ilfl-/ll2>ll$-/ll!+ll/-/lla, V 3 eC 

(see, e.g., Theorem 2.4.1 in [15]). In particular, \\g — /||| > 
|| g — /|| 2, and, since /* G C, we have 



II/* - /II < II/* - /|| 



V/eJP. 



(14) 



Consider the penalty 

pen(/) = log 2 (m + 1) + (3/2)||0|| o log a (m), 9 = W T f. 

This corresponds to the following prefix code for 9 G (that 
is, we encode the elements of 0, before they are subjected to 
the deterministic operation of projecting onto C): 

1) First we encode ||0||o, the number of nonzero compo- 
nents of 9, which can be encoded with log 2 (™+ 1) bits. 

2) For each of the ||0||o nonzero components, we store 
its location in the 9 vector; since there are m possible 
locations, this takes log 2 (m) bits per component. 

3) Next we encode each coefficient value, quantized to one 
of y/m uniformly sized bins. 

Since this corresponds to a uniquely decodable code for / G J 7 
(or 9 G 0), we see that pcn(/) satisfies the Kraft inequality. 

Now, given 9* = W T f*, let 6K fc ) be its best /c-term 
approximation, 9^ G the quantized version of 6K fc ), for 
which we have 



Jk) 
'1 

I 



< 



and 9q k ^ the element of T obtained by projecting fq 
W9 q kS> onto C and then transforming back into the basis <I>: 



(*0 



7 q 



= W T fq 



T J(fe) 
? ■ 

Wl 



Then, using 



II/* -/«H 



< 



< 



< 



I/' 



-mi 

*_0(fc)||2 

2Ck~ 2a - 



and (1 1 3b, we get 



h2||6» (fe) -6»( fe) ||^ 
2fc N 



Given each 1 < fc < to, let T k C T be the set of all 9 G 
r, such that the corresponding 9 G satisfies ||#||o < k. 
Then \T k \ = (™)m k l 2 , so that log 2 |r fe | < 2klog 2 m, and 
therefore T k G Qn whenever k < k*(N), where k*,(N) = 



Nj (2c4 log 2 to). Then the first term on the right-hand side of 
(TTOl can be bounded by 



C N min R*(f*,T k ) 

l<k<k t (N) 



< O(N) min 

l<fc<fe»(iV) 



< O(N) min 

l<fc<fe»(iV) 



-2a 



I 

k 



2 P cn(/j fc) ) 
I 



k log 2 to 



where the constant obscured by the O(-) notation depends only 
on C and c. We can now consider two cases: 
1) I < to log to, i.e., the penalty term dominates the quanti- 
zation error, then we get the risk bound Ei?(/*, /) < 



-2a 



O(N) min 

l<fc<fc„(ATj 



If K(N) > {aI/\og. 2 m) 1 / { - 2c 



ER(f,f)<0(N) 



2fclog 2 to 



2c 2 i log(c 2 TO/7V) 
N ' 



log TO 



x ), then we can further obtain 

_5 ^t 2c^log(c 2 TO/7V) 
+ N ' 



If k*(N) < (aI/log 2 m) 1 /( 2a+1 \ there are not enough 
measurements, and the estimator saturates, although its risk 
can be controlled. 

2) I > 772 log to, i.e., the quantization error dominates the 
penalty term. Then we obtain ER* (/*,/) < 



O(N) min 

l<fc<fe,(Af) 



k 



-2a 



2k 



2c\ log(c 2 m/N) 
N ' 



If k*(N) > (am) 1 ^ 2a+1 \ then we can further get 

2c! log(c 2 TO/iV) 



ER(f*,f)<0(N)m- 



N 



Again, if k*(N) < (am) 1 ^ 2a+1 \ there are not enough 
measurements, and the estimator saturates. 

Note that, when I X to and N x m 1 ' p for some p > 
1 + l/2a, we get (up to log factors) the rates 



ER(f*,f) = 0(i 



-0\ 



where (3 



2a-(2a+l)/p „ 
2a+l ^ U ' 



VI. Conclusion 



We have derived upper bounds on the compressed sensing 
estimation error under Poisson noise for sparse or compress- 
ible signals. We specifically prove error decay rates for the 
case where the penalty term is proportional to the ^o- norm 
of the solution; this form of penalty has been used effec- 
tively in practice with a computationally efficient Expectation- 
Maximization algorithm (cf. [16]), but was lacking the theoret- 
ical support provided by this paper. Furthermore, the main the- 
oretical result of this paper holds for any penalization scheme 
satisfying the Kraft inequality, and hence can be used to 
assess the performance of a variety of potential reconstruction 
strategies besides sparsity-promoting reconstructions. 

One significant aspect of the bounds derived in this paper 
is that they grow with N, the size of the measurement 



array, which is a major departure from similar bounds in the 
Gaussian or bounded-noise settings. It does not appear that 
this is a simple artifact of our analysis. Rather, this behavior 
can be intuitively understood to reflect that elements of y will 
all have similar values at low light levels, making it very 
difficult to infer the relatively small variations in Af* . Hence, 
Poisson compressed sensing using shifted Rademacher sensing 
matrices is fundamentally difficult when the data are very 
noisy. It may be possible to address these limitations through 
alternative constructions of sensing matrices which introduce 
more variation in the signal Af*. 
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